Goto

Collaborating Authors

 probabilistic principal component analysis


On the Consistency of Maximum Likelihood Estimation of Probabilistic Principal Component Analysis

Neural Information Processing Systems

Probabilistic principal component analysis (PPCA) is currently one of the most used statistical tools to reduce the ambient dimension of the data. From multidimensional scaling to the imputation of missing data, PPCA has a broad spectrum of applications ranging from science and engineering to quantitative finance.\Despite


Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data

Neural Information Processing Systems

Missing Not At Random (MNAR) values where the probability of having missing data may depend on the missing value itself, are notoriously difficult to account for in analyses, although very frequent in the data. One solution to handle MNAR data is to specify a model for the missing data mechanism, which makes inference or imputation tasks more complex. Furthermore, this implies a strong \textit{a priori} on the parametric form of the distribution. However, some works have obtained guarantees on the estimation of parameters in the presence of MNAR data, without specifying the distribution of missing data \citep{mohan2018estimation, tang2003analysis}. This is very useful in practice, but is limited to simple cases such as few self-masked MNAR variables in data generated according to linear regression models.


Review for NeurIPS paper: Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data

Neural Information Processing Systems

Additional Feedback: I have read the other reviews and the authors' feedback. With the addition of the recommender system experiment, walking the readers through how (1) the MNAR and PPCA model apply in this setting, (2) selecting the hyper-parameters for the imputation algorithm, (3) showing how the imputations compare with prior algorithms, helps make a strong case for the proposed method. If the authors re-arrange the paper to improve clarity (as the reviews point out, and as they promise in their feedback), the paper can be substantially stronger. There are a few lingering questions from the reviews that the authors should address in the paper at a minimum -- (1) a discussion on a stage-wise approach to imputation (and why that may not be necessary for their sequence of regressions), (2) given that some of the linear coefficients can be zero, what must a practitioner do when one of the regressions estimate a coefficient close to 0 that is then used in the denominator of other estimates. Even better if the illustration is grounded in an example like movie item ratings.


Review for NeurIPS paper: Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data

Neural Information Processing Systems

There was no consensus among the referees for strongly supporting acceptance and it was felt that the paper was at-best marginal. We hope the reviews would be helpful in developing a stronger version of the paper.


On the Consistency of Maximum Likelihood Estimation of Probabilistic Principal Component Analysis

Neural Information Processing Systems

Probabilistic principal component analysis (PPCA) is currently one of the most used statistical tools to reduce the ambient dimension of the data. From multidimensional scaling to the imputation of missing data, PPCA has a broad spectrum of applications ranging from science and engineering to quantitative finance.\Despite In fact, it is well known that the maximum likelihood estimation (MLE) can only recover the true model parameters up to a rotation. The main obstruction is posed by the inherent identifiability nature of the PPCA model resulting from the rotational symmetry of the parameterization. To resolve this ambiguity, we propose a novel approach using quotient topological spaces and in particular, we show that the maximum likelihood solution is consistent in an appropriate quotient Euclidean space.


Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data

Neural Information Processing Systems

Missing Not At Random (MNAR) values where the probability of having missing data may depend on the missing value itself, are notoriously difficult to account for in analyses, although very frequent in the data. One solution to handle MNAR data is to specify a model for the missing data mechanism, which makes inference or imputation tasks more complex. Furthermore, this implies a strong \textit{a priori} on the parametric form of the distribution. However, some works have obtained guarantees on the estimation of parameters in the presence of MNAR data, without specifying the distribution of missing data \citep{mohan2018estimation, tang2003analysis}. This is very useful in practice, but is limited to simple cases such as few self-masked MNAR variables in data generated according to linear regression models.


A Dual Formulation for Probabilistic Principal Component Analysis

De Plaen, Henri, Suykens, Johan A. K.

arXiv.org Artificial Intelligence

PCA, but rather in another model based on similar In this paper, we characterize Probabilistic Principal principles. Component Analysis in Hilbert spaces and demonstrate how the optimal solution admits a More recently, Restricted Kernel Machines (Suykens, 2017) representation in dual space. This allows us to develop opened a new door for a probabilistic version of PCA both a generative framework for kernel methods. in primal and dual. They essentially use the Fenchel-Young Furthermore, we show how it englobes Kernel inequality on a variational formulation of KPCA (Suykens Principal Component Analysis and illustrate its et al., 2003; Alaíz et al., 2018) to obtain an energy function, working on a toy and a real dataset.


On Robust Probabilistic Principal Component Analysis using Multivariate $t$-Distributions

Guo, Yiping, Bondell, Howard D.

arXiv.org Machine Learning

Principal Component Analysis (PCA) is a common multivariate statistical analysis method, and Probabilistic Principal Component Analysis (PPCA) is its probabilistic reformulation under the framework of Gaussian latent variable model. To improve the robustness of PPCA, it has been proposed to change the underlying Gaussian distributions to multivariate $t$-distributions. Based on the representation of $t$-distribution as a scale mixture of Gaussians, a hierarchical model is used for implementation. However, although the robust PPCA methods work reasonably well for some simulation studies and real data, the hierarchical model implemented does not yield the equivalent interpretation. In this paper, we present a set of equivalent relationships between those models, and discuss the performance of robust PPCA methods using different multivariate $t$-distributed structures through several simulation studies. In doing so, we clarify a current misrepresentation in the literature, and make connections between a set of hierarchical models for robust PPCA.


Adaptive probabilistic principal component analysis

Farooq, Adam, Raykov, Yordan P., Evers, Luc, Little, Max A.

arXiv.org Machine Learning

Using the linear Gaussian latent variable model as a starting point we relax some of the constraints it imposes by deriving a nonparametric latent feature Gaussian variable model. This model introduces additional discrete latent variables to the original structure. The Bayesian nonparametric nature of this new model allows it to adapt complexity as more data is observed and project each data point onto a varying number of subspaces. The linear relationship between the continuous latent and observed variables make the proposed model straightforward to interpret, resembling a locally adaptive probabilistic PCA (A-PPCA). We propose two alternative Gibbs sampling procedures for inference in the new model and demonstrate its applicability on sensor data for passive health monitoring.